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ABSTRACT 

This preliminary investigation grew out of the 
Project English study of written cosposition at the University of 
Georgia from 1963 through 1968. Toward the end of that project, 
attention was given to possible relationships between the guality of 
written composition and certain syntactic measures in student writing 
produced in schools using the project-developed materials. In 
summary, the syntactic measures, studies clearly distinguish between 
high and low guality writing in the second, fourth, and sixth grades. 
A general implication of these findings is that the teaching of 
structural options to enhance maturity in writing might also, at the 
elementary level, enhance quality. (TO) 
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Introduction . The focus of this preliminary Investigation grew out of 
a larger study of the written composition of young chlldreni namely, the 
"Project English" study of written composition at the University of Georgia, 
(1963 to 1968), Specifically, the Georgia English Curriculum Study Center 
developed materials for an elementary school curriculum In written composition. 
Part of that curriculum project Included the development of an instrument to 
assess globally the quality of writing produced In schools using the Study 
Center materials* Toward the end of the project and as data reflecting quality 
began to come in, attention was divided to possible relationships between 
levels of rated quality and certain syntactic measures. This report, therefore> 
is a surrrnary of those preliminary investigations. 

As several researchers (Loban, 1963; Hunt, 1965; O'Donneil, 1967) have 
observed, syntactic measur**s Indicate maturity in writing. Hunt, who has used 
the minimal terminal syntactic unit^ as a basic unit of measurement in syntactic 
analysis, has noted significant differences between grades fouri eighty and 
twelve, O^Oonnell found differences between grades three, five, and seven. 
In each case, however, the measurement Involved structural or synta^^tic com- 
plexity and not estimated quality, 

Mellon (1967), Bateman and Zldonis (1966), 0*Hare (1973) and many other 
researchers have employed syntactic measures as criteria In experimental studies 
of written composition. Except for Mellon's and O'hare^s, however^ these 
and similar investigations have not included overall estimates of quality. 
Moreover, they have concentrated on the sentence as the unit of measurement 
and analysis. And even though they have demonstrated that a more felicitous 

^Generally referred to as T-unlt, It is described as one main clause and 
its mod1f1ers--the shortest grammatically allowable unit Into which an essay 
can be segmented without leaving residue. 



use of a greater variety of structures can be taught, they have not shown 
that a direct connection exists between syntactic complexity and overall 
quality. The one sub-sample that Mellon checked for both quality and sentence 
structure revealed no statistically significant correlations between the two 
variables of syntax and quality. O'Hare's experimental group, however, the one 
that demonstrated greater syntactic maturity, also produced writing that was 
judged to be significantly better than that of the control group, the ope with 
normal syntactic maturity. Still, specific aspects of syntax were not compared 
with overall quality. 

In short, syntactic measires are objective, and they do reflect maturity 
in writing, maybe even quality. It was therefore the purpose of this study to 
compare observed differences in the estimated quality of writing and differences 
in syntactic complexity. Furthermore, it was assigned that the degree to which 
a syntactic measure correlated with global ratings would be one index to the 
validity of the developed Instrument. 

Test Development . First, a brief description of the developed instrument 
is in order. In the view of several researchers (Braddock, 1963i Findley, 
1963j Derrick, 1964), evaluation of children's writing should focus upon the 
actual composition 1tself--the child's Written nroduct. One possibility for 
such evaluation is the product-scale class of instrument, an example of which is 
the S.T.E.P. Essay Test . It evaluates the "product" (that is, the student's 
wri tten composi tion) by comparing 1 1 wi th other "products" to deri ve a relati ve 
measure of its merit. In the development of a product-scale composition ins tru- 
ment, products which are actual samples of student wHtIng must be selected to 
serve as models. In order to provide the necessary controls in developing these 
models, it is important that the models reflect common criteria, that they be 
selected by trained raters, that they come from several samples; anjl, above all, 



that these nlO(ie Is can then be used by minimally trained raters when the writing 
samples to be evaluated are produced under standardized examination situations 
paralleling those of the models. 

Essentially, ths test-development task was to construct a product-scale 
instrument that would yield reliable estimates of the overall quality of 
writing samples produced by elementary school children. Thus, it was necessary 
to begin by collecting samples of pupil writing from which model or comparison 
essays could be select^ed. Steps in this process Included: 

1) reviewing the literature to determine which criteria were 
typically used in evaluating written composition at the 
elementary level (33 were found) i 

2) identifying from this list those criteria that were to 

be used in selecting the model essays--nine WltefiS were 
selected by 32 cooperating teachers who ranked the criteria 
as to Importance; 

3) obtaining under standardized conditions writing samples / 
from classes cooperating with the English Curriculum Study 
Canter^ 

4) rating these papers, according to the selected criteria, 
using trained raters on a seven-point scale; 

5) selecting from these rated papers reliably rated (.87) 
comparison essays that represent--according to the 
criteria—high, average, and low quality (point six, four, 
and two) on the rating scale; 

6) obtaining additional writing samples on each of the 
eight different topics to use in & reliability check; 
and, finally, 



7) rating tnese additional papers, using both trained and untrained .' i. 

raters as well as the selected comparison essays for points of 
reference on the rating scale. 
Rater reliability data were collected from several groups of raters on, 
several separate occasions over more than two years. Included In the estimate 
of rater reliability were .trained raters and raters with only a minimal amount .|; 
of instruction, Data' were collected to show the reliability of one rater, 
the reliability ar,:ong different numbers of raters, Inter-rel lability of topics 
or forms, and the reliability of both trained and untrained raters. 

Inter-rater reliability was found to be high (.70 or better). In fact, 
it wa» significantly higher than that achieved In typical ratings made by 
English teachers (.50) and comparable to that typically reported (.70) for 1: 
trained raters (Oiederich, 1964). Test-retest reliability coefficients for .J 
the different forms ranged from .58 to .89, and a single test-retest check of " 
reliability on the same form proved to be .71. .v^ 
Validity was assessed by comparing ratings made by the criteria method "-^ 
with those made by the comparison method. Three trained raters rated the A 
essays by both methods. For the criteria method, reliability coefficients with v 
a range of ,49 to .86 and a mean of .66 were obtained. For the comparison method .| 
wi^h the same raters and the same papers, coefficients with a range of ,52 to .80 v 
ancj a mean of .64 were obtained. Differences between inter-rater reliability for 
the two methods were not statistically significant. Both methods produced rellabirj-f 
ratings, and overall agreement between the two method? (.79) provides at Idast'one 

estimate of validity, 

For each form, the ratings by comparison were higher than the ratings by . Z 
criteria', and for three forms the differences were significant. Both^methods led; 
i^'^ssenjtliVi'yHfie s^me conc^ wHh regj^rd to a raik^i^tj'pH^ | 
, two methods 1 ed to dl f f erent cortciuH o^i's abduf Iths'tblolltHcfaW #-tlfr^'1^i|if s * . : n 



On this particular test of validity, the developed Instrument provided a valid ^ 
Index to the relative quaUty of the sample essays but did not yield estimates 
of particular levels of quality (1-7) comparable to estimates based on criteria. 

The syntax study, It should b6 noted, was based on the relative qua! 1 ty-- 
and not the absolute quail ty-of the papers. 

Sam ple for Syntax Study . In another school (that Is, one not cooperating with 
the ECSC), eight forms of the developed Instrument were administered to all ■ 
second, fourth, and sixth grade children: four forms In the fall of 1967 and four 
In the spring of 1968. Twenty-suven children were selected, In random fashion, 
from each of the three grades. Thus, eight papers from 81 children (648 papers 
In all) were available for rating anJ analysis. All papers were rated by three 
raters and an average taken. These averages were then summed across all eight 
papers to provide a single estimate of qual 1 ty. ^ Finally, these estimates of 
qua! ity were ranked and divided into three equal groups (high, middle, and low) 
for each of the three grades. The selected sample, therefore, had equal-size 
groups of children in each of three levels (H, M, L) for each of three grades 
(6, 4, 2). Table I shows this grouping. 

y \ - : y^ TAeLE.I : 

NUMBERS OF CHILDREN BY 
LEVEL AND GRAPE 



Rated Quality grade . 

Level 2 4 0 ji 



Additional data on the children showed the grade groupings not significantly, 
different on either mental or reading ability. Also, the range for each grade 
followed a normal distribution and range in both I.Q, scores (mean of 104) and 
grade placement reading scores (means at or above grade level). The possible 
relationships between I.Q. scores, reading scores, rater estimates of quality, 
and the several syntactic measures constitute another study that we have projected 
to follow this preliminary investigation into quality and syntax. These subject 
variables, since they did not differentiate groups, are not considered in the 
present study of syntax. 

It should also be noted that a composite of eight papers, as used in this 
study, provides as reliable an estimate of rated quality as one is likely to get 
(Diederich, 1964). Also, the syntactic data are comparable to those of O'Donnell' 
(1967) similar sample. In short, these procedures provided for a cross section 
of grade-level and rated-quality writing of elementary age children, writing 
that could then be looked at syntactically. 

Procedures for Syntax Study . First of all, each paper was segmented into 

T-units. Frequency counts, based on a composite of all 61 ght papers, were then 

2 

made of the following: 

1 ) total words 

2) total T-units 

3) words per T-unit 

" 4) subordinate clauses 

5) total clauses 

6) clauses per T-unit 

7) coordinators of T-units 

^Acknowledgment Is here given to Dan Ward who, as part of his thesis stud^ 
at the University of Georgia made these counts. 



These counts are reported In Table 2. 

For .the sake of comparison, those counts were translated to a base of 100 
T-units and are reported In Table 3, One further step 1nvolv<id a breakdjown of 
the subordinate clauses Into nominal, adverbial, and adjective clauses. This 
breakdown is shown in Table 4. 

Differences between levels and grades were checked for statistical signifi- 
cance and are reported in Tables 5 and 6. ^ 

Results of the Syntactic Study . As several researchers (Biesbrock, 1968; 
Cartwright, 1968; and Martin, 1968) have shown, when evaluation of written 
compositiojri is based on a timed sample, length of composition is likely to be 
a factor in rated quality. In this study, significant differences in composition 
length (total words) were found between all grades and between most levels of 
qual1x:y. Generally speaking, the same situation was l^ound to be true of the total 
number of T-units and clauses, except at the thresholds between grades— that 1s, 
between high second and low fourth grades and between high fourth and Tow sixth >y 
grades (see Table 2). Also, as both Hunt (1965) and O'Donnelr (1967) have shown, 
T-unit and clause length are significant Indices of matur1ty--that is, they 
correlate highly with grade level. 

In this study, too, clause length, T-unit length, and number of clauses 
per T-unit steadily increased (at a statistically significant level ) from grade to 
grade, In fact, the data correspond almost precisely with O'Donneli's (1967) 
crors-sectional sample. Thus, the writing' specimens 'under consideration were 
reliably rated as to global quality, and they also reflected the usual distinctions 
in Structural complexity. The questions, then, of this preliminary study centered 
around possible conneJitions between re11a()1e i'ndlces of striiUural complexity 
and "reliable es'tlktes of qijalUy, ' ' 



Table 5 summarizes syntactic distinctions between levels of quality within 
grades. It shows that T-unIt length was related to differing quality levels at 
each of the three grades. It distinguished differently, however, for each of the 
grades— at grade two, between low and middle and between low and high but not 
between middle and high. At the fourth grade it dlstlnguisheid only between low and 
high levels; and, at the sixth grade, it distinguished between middle and high and 
between low and high but not between low and middle. For all grades, though, an 
increase In at least one word in T-unIt length differentiated between low and high 
levels of quality. 

An Increase In the number of subordinate clauses clearly distinguished between 
all levels of quality except between low and middle second grade groups. Apparently 
this Increase In subordinate clauses accounted for much of the Increase In T-unIt 
length, for, as Table 5 shows, clause length differentiated only between low and 
high second grades. Also, as Tables 4 and 5 Indicate, the increased number of 
subordinate clauses included all kinds of clauses at all levels and grades; but 
adverblals Increased more than did either nominal or adjective clauses i both of 
which remained rather stable for the second and fourth grades, Increasing mainly 
at grade six. 

The nominal clauses, as Table 4 suggests. Included very little dialog 
except at the sixth grade, high level. In addition, the positions of the adverMal 
clauses showed a balance increase through all grades and levels, with the medial 
position exhibiting the highest percentage increase- 
Hunt (1965) observed that the use of simple coordinators of T-unlts, like and 
and but, are marks of Immature writing, and Potter (1967) noted the same character- 
istic In Identifying the structures of "good" and"poor" writing at the tenth grade. 
The absence of such coordinators In this sample clearly marked the writing of high 
fourth graders and all levels of qoallty at the sixth grade. 



V 



Table 6 Usts significant distinctions between levels of quality across 
grades. As the table indicates, words per T-unitv words per clause, and clauses 
per T-unit distinguish between almost all levels of quality, with words per 
T-unit significantly making 22 of 27 possible distinctions in quality. Words per 
clause make 20 such distinctions, while clauses ptr T-unit made 17. Thus, T-unit 
length proved to be the most effective syntactic narker of quality, although two 
related factors, clause length and number of clauses, also proved effective. 

These structural characteristics of sentences distinguished more levels of 
quality at the second and fourth grades than at the sixth. In fact, the syntax 
of low quality sixth grade writing was shown to be fu>t essentially different 
from all levels of the fourth grade. In the breakdown of clauses, in particular, 
as much variation was found within grades as between them. 

Conclusions and Implications . In sunmaryi the syntactic measures studies 
cl early disti ngui shed between hi gh and 1 ow qual 1 ty wri ti ng 1 n the second , fourth * 
and sixth grades. Also, the addition of subordinate clauses ieemed to account more 
for differing levels of quality within grades than for differences in quality or 
syntax between grades. Moreover, no one kind of clause seemed to account for 
qualitative differences more than any other; and, for adverbial s, position did 
not appear to distinguish quality. In short, the same syntactic measure (T-unit 
length) that has recently been shown to Identify maturity In writing appears also 
to distinguish at least two— and. In some cases, three--levels of quality in 
these elementary grades. 

A general implication of these findings is that the teaching of stfilcturaV 
options to enhance maturity in writing might also, at the elementary level, 
enhance quality. 
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